-
Notifications
You must be signed in to change notification settings - Fork 134
fix(udb): update backoff algo #3469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 11-12-chore_write_update_docs
Are you sure you want to change the base?
fix(udb): update backoff algo #3469
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
3 Skipped Deployments
|
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
How to use the Graphite Merge QueueAdd the label merge-queue to this PR to add it to the merge queue. You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
PR Review: fix(udb): update backoff algoSummaryThis PR updates the transaction retry backoff algorithm to more closely align with FoundationDB's behavior. Overall, the changes look good and represent a significant improvement. Code Quality: ✅ GoodStrengths:
Suggestions:
Correctness: ✅ VerifiedAlgorithm Analysis: Old algorithm:
New algorithm:
The math checks out correctly:
Potential Issues:
|
9b19169 to
a0dfaaf
Compare
52bea63 to
ce30dce
Compare
Code Review: fix(udb): update backoff algoSummaryThis PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. Overall, this is a good improvement that makes the backoff more aggressive and consistent with FDB's design. ✅ Positive Changes
🐛 Potential IssuesCritical: Modulo bias in jitter calculationLocation: let jitter_ms = rand::random::<u64>() % base;Issue: Using modulo with Recommendation: Use a proper bounded random range: use rand::Rng;
let jitter_ms = rand::thread_rng().gen_range(0..base);This requires adding Minor: Inconsistency in test fileLocation: The test file has its own copy of // Instead of duplicating, import from the library
use universaldb::calculate_tx_retry_backoff;📊 Behavioral AnalysisOld algorithm:
New algorithm:
The new algorithm is:
🔒 Security ConsiderationsNo security concerns identified. 🧪 Test CoverageNo new tests were added. Consider adding a unit test to verify:
Example test: #[test]
fn test_backoff_algorithm() {
// Test specific attempts
assert_eq!(calculate_tx_retry_backoff(0), 10..=11);
assert_eq!(calculate_tx_retry_backoff(1), 20..=22);
// Test max cap
let backoff_7 = calculate_tx_retry_backoff(7);
let backoff_100 = calculate_tx_retry_backoff(100);
// Both should have same base (capped at 7)
assert!(backoff_7 >= 1280 && backoff_7 < 1408);
assert!(backoff_100 >= 1280 && backoff_100 < 1408);
}📝 RecommendationsRequired:
Optional: Overall Assessment✅ Approved with suggested fixes The core change is sound and improves the backoff algorithm. The modulo bias issue should be addressed before merging, and the test file inconsistency should be resolved. |
Code ReviewSummaryThis PR updates the exponential backoff algorithm for transaction retries in UniversalDB to better align with FoundationDB's implementation. The changes improve the retry behavior by capping the max backoff and adjusting the jitter calculation. Positive Changes ✅
Issues Found 🔍1. CRITICAL: Outdated duplicate function in testsLocation: There's a duplicate pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
let jitter_ms = rand::random::<u64>() % 100;
base_backoff_ms + jitter_ms
}Action required: This duplicate should be removed and the test should import from 2. Potential arithmetic issue in jitter calculationLocation: The jitter calculation
The comment says "0-10% of backoff ms" but the actual range is 0 to (base-1)ms, which is slightly less than 10% of Suggested fix: // Jitter is 0-10% of backoff ms
let jitter_ms = rand::random::<u64>() % (base_backoff_ms / 10 + 1);This would give:
Which matches the documentation exactly. 3. Missing unit testsThere don't appear to be any unit tests specifically for
Suggested test: #[test]
fn test_tx_retry_backoff() {
// Test base values without jitter (by checking range)
for attempt in 0..10 {
let backoff = calculate_tx_retry_backoff(attempt);
let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
let max_jitter = 2_u64.pow((attempt as u32).min(7));
assert!(backoff >= expected_base);
assert!(backoff < expected_base + max_jitter);
}
// Verify max cap
assert!(calculate_tx_retry_backoff(7) < 1500); // Should be ~1280ms + jitter
assert!(calculate_tx_retry_backoff(100) < 1500); // Should still be capped
}Minor Suggestions 💡
Security Considerations 🔒No security concerns identified. The backoff algorithm is appropriate for preventing retry storms. Performance Considerations ⚡The changes improve performance by reducing max backoff from ~10s to ~1.28s, allowing faster recovery from transient conflicts. VerdictGood improvement to align with FDB behavior, but needs the test duplicate removed and the jitter calculation fixed to match the documentation. Consider adding unit tests for this critical function. 🤖 Generated with Claude Code |
ce30dce to
7e478e6
Compare
a0dfaaf to
9b619f4
Compare
PR Review: fix(udb): update backoff algoSummaryThis PR updates the exponential backoff algorithm for transaction retries in UniversalDB to more closely align with FoundationDB's implementation. The changes improve the retry behavior with better jitter and a more appropriate max backoff. Positive Changes ✅
Issues & Concerns 🚨1. Critical Bug: Jitter Calculation is Incorrect (engine/packages/universaldb/src/utils/mod.rs:46)let jitter_ms = rand::random::<u64>() % base;The comment says "Jitter is 0-10% of backoff ms" but this actually provides 0-100% jitter of For attempt 0:
For attempt 1:
For attempt 7:
Fix: To match the docstring (0-10% of backoff), it should be: let jitter_ms = rand::random::<u64>() % (base + 1);Or update the docstring to reflect actual behavior (0-100% of base, which is 0-10% of base_backoff_ms). 2. Stale Test Function (engine/packages/universaldb/tests/rocksdb.rs:140-146)There's a duplicate implementation of 3. Random Number Generation Concerns (engine/packages/universaldb/src/utils/mod.rs:46)
Minor Observations
RecommendationsMust Fix:
Nice to Have:
Test CoverageThe function is tested indirectly through the rocksdb tests, but would benefit from a dedicated unit test to verify the backoff progression matches the documented behavior. Overall, this is a good improvement to the backoff algorithm, but the jitter calculation bug needs to be fixed before merging. |
7e478e6 to
3c6e472
Compare
9b619f4 to
267ddc0
Compare
PR Review: fix(udb): update backoff algoSummaryThis PR updates the transaction retry backoff algorithm in UniversalDB to better align with FoundationDB's behavior. The changes improve the retry strategy by capping attempts at 7 (resulting in max ~1.4s) and using proportional jitter. Code Quality & Best PracticesPositive
Issues Found 1. Critical: Inconsistent Implementation in Test File
pub fn calculate_tx_retry_backoff(attempt: usize) -> u64 {
let base_backoff_ms = 2_u64.pow((attempt as u32).min(10)) * 10;
let jitter_ms = rand::random::<u64>() % 100;
base_backoff_ms + jitter_ms
}Impact: Tests are using the old backoff logic, making them inconsistent with production code. Fix: Remove the duplicate function from the test file and import it from the main module instead: use universaldb::calculate_tx_retry_backoff;2. Bug: Incorrect Jitter CalculationLine 46: Issue: The comment says "Jitter is 0-10% of backoff ms" but the code implements 0-base where base is Examples:
Wait, the code is actually correct - the jitter IS proportional to the base. However, the comment is misleading:
Clarity Fix: Consider updating the comment to be more explicit: // Jitter is 0-10% of backoff ms (ranges from 0 to base milliseconds)
let jitter_ms = rand::random::<u64>() % base;Performance ConsiderationsImproved Performance
Potential Concern: The reduced max backoff might lead to more aggressive retries under high contention. However, this aligns with FDB's design, so it should be acceptable. Security ConcernsNo security issues identified. The use of Test CoverageMissing
Recommendation: Add tests like: #[test]
fn test_backoff_progression() {
// Test that backoff values are within expected ranges
for attempt in 0..10 {
let backoff = calculate_tx_retry_backoff(attempt);
let expected_base = 2_u64.pow((attempt as u32).min(7)) * 10;
let expected_max_jitter = 2_u64.pow((attempt as u32).min(7));
assert!(backoff >= expected_base);
assert!(backoff < expected_base + expected_max_jitter);
}
}
#[test]
fn test_backoff_max_cap() {
// Verify attempts beyond 7 use same base as attempt 7
let backoff_7_base = calculate_tx_retry_backoff(7) / 10 * 10;
let backoff_100_base = calculate_tx_retry_backoff(100) / 10 * 10;
// Both should have same base (1280ms), only jitter differs
assert_eq!(backoff_7_base, backoff_100_base);
}Action Items
VerdictThe core algorithm change is correct and improves alignment with FDB, but the duplicate function in tests needs to be addressed before merging. |
267ddc0 to
650a53e
Compare
PR Review: fix(udb): update backoff algoThis PR updates the transaction retry backoff algorithm in UniversalDB to more closely align with FoundationDB implementation. Overall, this is a solid improvement with better documentation and cleaner code organization. Strengths
Critical Issue: Potential Arithmetic BugIn engine/packages/universaldb/src/utils/mod.rs:46, the jitter calculation has a mismatch between code and comment. The comment says "Jitter is 0-10% of backoff ms" but the code does rand::random::() % base which gives 0 to base-1, not 0-10% of base_backoff_ms. Recommendation: Based on your examples in the docs showing "0 -> 10ms + 0-1ms jitter" where base=1, the code seems correct. Update the comment to match: "Jitter is 0 to (base-1) ms" Other ObservationsTypeScript Changes: Moving assert import to Node built-in is good. The TODO comment about renaming createTs to update_ts is helpful - consider creating a tracking issue. Performance Impact: The new algorithm reduces maximum wait time from ~10s to ~1.3s which improves user experience and system responsiveness under contention. Test Coverage: Well-tested in production code. Consider adding unit tests for calculate_tx_retry_backoff to verify backoff values and jitter bounds. Recommendations
SummaryWell-thought-out improvement. The only blocker is clarifying the jitter calculation to ensure code and comments match. Once resolved, this should be good to merge. Review generated with Claude Code |

No description provided.